> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/Anny26022/chartsmaze_clone/llms.txt
> Use this file to discover all available pages before exploring further.

# process_historical_market_breadth.py

> Historical Market Breadth Calculator - Generates time-series breadth metrics for charting and trend analysis

## Overview

`process_historical_market_breadth.py` calculates day-by-day market breadth indicators across the entire stock universe, generating a historical time-series dataset for the Market Breadth Dashboard charts.

<Info>
  **Pipeline Position:** Phase 4 - Historical analytics generation

  **Critical Function:** Powers breadth trend charts with 250 days of advance/decline, SMA breadth, and momentum indicators
</Info>

## Purpose

This script:

* Processes 250 trading days of historical OHLCV data for all tracked stocks
* Calculates daily breadth metrics (advances, declines, SMA breadth, etc.)
* Merges stock breadth with major index price data
* Outputs a CSV file in a specific row-based format for dashboard consumption

## Input Files

<ParamField path="all_stocks_fundamental_analysis.json" type="JSON" required>
  Master stock list to determine which symbols to process
</ParamField>

<ParamField path="ohlcv_data/*.csv" type="CSV" required>
  Individual stock OHLCV files with columns: Date, Open, High, Low, Close, Volume
</ParamField>

<ParamField path="indices_ohlcv_data/NIFTY.csv" type="CSV" required>
  Nifty 50 OHLCV data used to establish the master timeline (last 250 trading days)
</ParamField>

<ParamField path="indices_ohlcv_data/*.csv" type="CSV" required>
  Index OHLCV files for:

  * NIFTY\_MIDCAP\_150.csv
  * NIFTY\_SMALLCAP\_250.csv
  * NIFTY\_MIDSMALLCAP\_400.csv
  * NIFTY\_500.csv
</ParamField>

## Output Files

<ResponseField name="market_breadth.csv" type="CSV">
  Row-based CSV with each metric as a row and dates as columns

  Format:

  ```csv theme={null}
  Type of Info,2025-05-15,2025-05-16,2025-05-17,...
  Up by 4% Today,23,45,12,...
  Down by 4% Today,8,15,5,...
  5 Day Ratio,1.45,1.52,1.38,...
  Above 200MA %,68.5,69.2,70.1,...
  Nifty 50,22450.30,22523.15,22601.80,...
  ```
</ResponseField>

<ResponseField name="market_breadth.json.gz" type="JSON (gzipped)">
  Compressed JSON version of the breadth data (currently placeholder in code)
</ResponseField>

## Processing Logic

### 1. Master Timeline Establishment

Uses Nifty 50's last 250 trading days as the reference timeline:

```python theme={null}
LOOKBACK_DAYS = 250

nifty_path = os.path.join(INDEX_OHLCV_DIR, "NIFTY.csv")
nifty_df = pd.read_csv(nifty_path)
timeline = nifty_df['Date'].tail(LOOKBACK_DAYS).tolist()
date_to_idx = {date: i for i, date in enumerate(timeline)}
num_days = len(timeline)
```

### 2. Breadth Matrices Initialization

Creates NumPy arrays for efficient metric storage:

```python theme={null}
# Matrices to store daily flags (Rows=Days, Cols=Stocks)
advances = np.zeros(num_days)
declines = np.zeros(num_days)
above_200ma = np.zeros(num_days)
above_50ma = np.zeros(num_days)
above_20ma = np.zeros(num_days)
above_10ma = np.zeros(num_days)
up_4pc = np.zeros(num_days)
down_4pc = np.zeros(num_days)
high_52w = np.zeros(num_days)
low_52w = np.zeros(num_days)
vol_plus = np.zeros(num_days)
vol_minus = np.zeros(num_days)
```

### 3. Stock-Level Processing

For each stock, calculates technical indicators and updates daily counters:

```python theme={null}
for csv_path in csv_files:
    symbol = os.path.basename(csv_path).replace(".csv", "")
    if symbol not in valid_symbols: continue
    
    # Re-read full history for technicals to avoid edge effects
    full_df = pd.read_csv(csv_path)
    full_df['SMA_10'] = full_df['Close'].rolling(10).mean()
    full_df['SMA_20'] = full_df['Close'].rolling(20).mean()
    full_df['SMA_50'] = full_df['Close'].rolling(50).mean()
    full_df['SMA_200'] = full_df['Close'].rolling(200).mean()
    full_df['Vol_SMA_20'] = full_df['Volume'].rolling(20).mean()
    full_df['H_52W'] = full_df['High'].rolling(252).max()
    full_df['L_52W'] = full_df['Low'].rolling(252).min()
    full_df['Prev_Close'] = full_df['Close'].shift(1)
    full_df['Daily_Ret'] = ((full_df['Close'] - full_df['Prev_Close']) / full_df['Prev_Close']) * 100

    # Filter back to timeline
    analysis_df = full_df[full_df['Date'].isin(timeline)]
    
    for _, row in analysis_df.iterrows():
        idx = date_to_idx.get(row['Date'])
        if idx is None: continue
        
        # Metrics Calculation
        if row['Close'] > row['Prev_Close']: advances[idx] += 1
        if row['Close'] < row['Prev_Close']: declines[idx] += 1
        
        if row['Close'] > row['SMA_200']: above_200ma[idx] += 1
        if row['Close'] > row['SMA_50']: above_50ma[idx] += 1
        if row['Close'] > row['SMA_20']: above_20ma[idx] += 1
        if row['Close'] > row['SMA_10']: above_10ma[idx] += 1
        
        if row['Daily_Ret'] >= 4: up_4pc[idx] += 1
        if row['Daily_Ret'] <= -4: down_4pc[idx] += 1
        
        if row['High'] >= row['H_52W']: high_52w[idx] += 1
        if row['Low'] <= row['L_52W']: low_52w[idx] += 1
        
        if row['Volume'] > row['Vol_SMA_20']: vol_plus[idx] += 1
        else: vol_minus[idx] += 1
```

### 4. Advance/Decline Ratio Calculation

Calculates rolling A/D ratios:

```python theme={null}
def calc_ratio(adv, dec, window):
    r = []
    for i in range(len(adv)):
        start = max(0, i - window + 1)
        sum_adv = sum(adv[start:i+1])
        sum_dec = sum(dec[start:i+1])
        ratio = round(sum_adv / sum_dec, 2) if sum_dec > 0 else 1.0
        r.append(ratio)
    return r

rows.append(to_csv_row("5 Day Ratio", calc_ratio(advances, declines, 5)))
rows.append(to_csv_row("10 Day Ratio", calc_ratio(advances, declines, 10)))
```

### 5. CSV Assembly

Assembles the final CSV in row-based format:

```python theme={null}
rows = []
rows.append("Type of Info," + ",".join(timeline))

# Momentum Indicators
rows.append(to_csv_row("Up by 4% Today", up_4pc.astype(int)))
rows.append(to_csv_row("Down by 4% Today", down_4pc.astype(int)))

# A/D Ratios
rows.append(to_csv_row("5 Day Ratio", calc_ratio(advances, declines, 5)))
rows.append(to_csv_row("10 Day Ratio", calc_ratio(advances, declines, 10)))

# Breadth Percentages
total_tracked = max(processed_count, 1)
rows.append(to_csv_row("Above 200MA %", np.round(above_200ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 50MA %", np.round(above_50ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 20MA %", np.round(above_20ma / total_tracked * 100, 1)))
rows.append(to_csv_row("Above 10MA %", np.round(above_10ma / total_tracked * 100, 1)))

# 52-Week Extremes
rows.append(to_csv_row("Reached 52w High", high_52w.astype(int)))
rows.append(to_csv_row("Reached 52w Low", low_52w.astype(int)))

# Volume
rows.append(to_csv_row("Volume greater than 20Day Average", vol_plus.astype(int)))
rows.append(to_csv_row("Volume less than 20Day Average", vol_minus.astype(int)))

# Raw Counts
rows.append(to_csv_row("Advances", advances.astype(int)))
rows.append(to_csv_row("Declines", declines.astype(int)))

# Index Prices
for label, prices in index_data.items():
    rows.append(to_csv_row(label, prices))
```

## Output Metrics

### Momentum Indicators

<ResponseField name="Up by 4% Today" type="integer[]">
  Daily count of stocks with +4% or greater return
</ResponseField>

<ResponseField name="Down by 4% Today" type="integer[]">
  Daily count of stocks with -4% or worse return
</ResponseField>

### Advance/Decline Ratios

<ResponseField name="5 Day Ratio" type="float[]">
  5-day rolling advance/decline ratio

  * Values > 1.0 indicate bullish breadth
  * Values \< 1.0 indicate bearish breadth
</ResponseField>

<ResponseField name="10 Day Ratio" type="float[]">
  10-day rolling advance/decline ratio
</ResponseField>

### Moving Average Breadth

<ResponseField name="Above 200MA %" type="float[]">
  Percentage of stocks trading above their 200-day SMA (daily)
</ResponseField>

<ResponseField name="Above 50MA %" type="float[]">
  Percentage of stocks trading above their 50-day SMA (daily)
</ResponseField>

<ResponseField name="Above 20MA %" type="float[]">
  Percentage of stocks trading above their 20-day SMA (daily)
</ResponseField>

<ResponseField name="Above 10MA %" type="float[]">
  Percentage of stocks trading above their 10-day SMA (daily)
</ResponseField>

### 52-Week Extremes

<ResponseField name="Reached 52w High" type="integer[]">
  Daily count of stocks hitting new 52-week highs
</ResponseField>

<ResponseField name="Reached 52w Low" type="integer[]">
  Daily count of stocks hitting new 52-week lows
</ResponseField>

### Volume Metrics

<ResponseField name="Volume greater than 20Day Average" type="integer[]">
  Count of stocks with above-average volume
</ResponseField>

<ResponseField name="Volume less than 20Day Average" type="integer[]">
  Count of stocks with below-average volume
</ResponseField>

### Index Prices

<ResponseField name="Nifty 50" type="float[]">
  Daily closing prices for Nifty 50
</ResponseField>

<ResponseField name="Nifty 500" type="float[]">
  Daily closing prices for Nifty 500
</ResponseField>

<ResponseField name="Nifty Midcap 150" type="float[]">
  Daily closing prices for Nifty Midcap 150
</ResponseField>

<ResponseField name="Nifty Smallcap 250" type="float[]">
  Daily closing prices for Nifty Smallcap 250
</ResponseField>

<ResponseField name="Nifty Midsmallcap 400" type="float[]">
  Daily closing prices for Nifty Midsmallcap 400
</ResponseField>

## Usage Example

```bash theme={null}
python process_historical_market_breadth.py
```

**Expected Output:**

```
⏳ Loading master stock list...
Targeting 2847 stocks for historical breadth.
🧬 Processing stock-level history...
✅ Analyzed 2847 stocks. Merging with Index data...
🚀 Market Breadth Historical Data generated: /path/to/market_breadth.csv
```

## Performance Optimization

* Uses NumPy arrays for memory efficiency with large datasets
* Processes full history once per stock to calculate technical indicators correctly
* Filters to timeline only for final analysis to reduce computation
* Avoids edge effects by using full historical data for rolling calculations

## Data Quality Notes

<Warning>
  **SMA Edge Effects Prevention**: The script reads the full historical CSV for each stock to calculate SMAs properly, then filters to the 250-day timeline. This prevents incorrect SMA values at the beginning of the timeline.
</Warning>

<Note>
  **Placeholder Metrics**: Some metrics like "Up by 25% in Month" and "Nifty 500 % of W\&M RSI > 60" are currently placeholders (zeros) and may be implemented in future versions.
</Note>

## Related Scripts

* [fetch\_indices\_ohlcv.py](/api/fetch-indices-ohlcv) - Fetches index OHLCV data required for processing
* [process\_market\_breadth.py](/api/process-market-breadth) - Generates current-day sector breadth analytics
* [bulk\_market\_analyzer.py](/api/bulk-market-analyzer) - Creates the master stock list
